Whole-genome sequence assembly for mammalian genomes: Arachne 2.

نویسندگان

  • David B Jaffe
  • Jonathan Butler
  • Sante Gnerre
  • Evan Mauceli
  • Kerstin Lindblad-Toh
  • Jill P Mesirov
  • Michael C Zody
  • Eric S Lander
چکیده

We previously described the whole-genome assembly program Arachne, presenting assemblies of simulated data for small to mid-sized genomes. Here we describe algorithmic adaptations to the program, allowing for assembly of mammalian-size genomes, and also improving the assembly of smaller genomes. Three principal changes were simultaneously made and applied to the assembly of the mouse genome, during a six-month period of development: (1) Supercontigs (scaffolds) were iteratively broken and rejoined using several criteria, yielding a 64-fold increase in length (N50), and apparent elimination of all global misjoins; (2) gaps between contigs in supercontigs were filled (partially or completely) by insertion of reads, as suggested by pairing within the supercontig, increasing the N50 contig length by 50%; (3) memory usage was reduced fourfold. The outcome of this mouse assembly and its analysis are described in (Mouse Genome Sequencing Consortium 2002).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Genome Assembly without Sequencing

Assembly of genomes from whole-genome sequencing (WGS) projects is one of the most complex computational problems in genomics. WGS assemblers such as Arachne [1] and Celera Assembler [2] are able to process data from millions of individual sequence "reads" and construct an accurate representation of a genome. These assemblies are in the form of contigs (contiguous stretches of DNA sequence) and...

متن کامل

ARACHNE: a whole-genome shotgun assembler.

We describe a new computer system, called ARACHNE, for assembling genome sequence using paired-end whole-genome shotgun reads. ARACHNE has several key features, including an efficient and sensitive procedure for finding read overlaps, a procedure for scoring overlaps that achieves high accuracy by correcting errors before assembly, read merger based on forward-reverse links, and detection of re...

متن کامل

Supporting Text

Genome Sequencing and Assembly. Initial shotgun libraries were generated and sequenced at the Broad by the Microbial Sequencing Center yielding 76,452 (PA2192) and 77,884 (C3719) sequences (paired-reads). The reads were assembled using ARACHNE (1, 2). After refinement, final assemblies contained 82 (PA2192) and 124 (C3719) contigs with a total sequence spanning single scaffolds of 6.83 Mb (PA21...

متن کامل

Analysis of segmental duplications and genome assembly in the mouse.

Limited comparative studies suggest that the human genome is particularly enriched for recent segmental duplications. The extent of segmental duplications in other mammalian genomes is unknown and confounded by methodological differences in genome assembly. Here, we present a detailed analysis of recent duplication content within the mouse genome using a whole-genome assembly comparison method ...

متن کامل

The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla

The analysis of the first plant genomes provided unexpected evidence for genome duplication events in species that had previously been considered as true diploids on the basis of their genetics. These polyploidization events may have had important consequences in plant evolution, in particular for species radiation and adaptation and for the modulation of functional capacities. Here we report a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome research

دوره 13 1  شماره 

صفحات  -

تاریخ انتشار 2003